Emoticons vs. Emojis on Twitter: A Causal Inference Approach
نویسندگان
چکیده
Online writing lacks the non-verbal cues present in face-toface communication, which provide additional contextual information about the utterance, such as the speaker’s intention or affective state. To fill this void, a number of orthographic features, such as emoticons, expressive lengthening, and non-standard punctuation, have become popular in social media services including Twitter and Instagram. Recently, emojis have been introduced to social media, and are increasingly popular. This raises the question of whether these predefined pictographic characters will come to replace earlier orthographic methods of paralinguistic communication. In this abstract, we attempt to shed light on this question, using a matching approach from causal inference to test whether the adoption of emojis causes individual users to employ fewer emoticons in their text on Twitter. Introduction People are changing writing to express themselves in online settings, often through the use of non-standard orthographies, such as emoticons (e.g., (:) and letter repetitions (e.g., coooolll) (Dresner and Herring 2010; Kalman and Gergle 2014). The introduction of emojis is a potentially dramatic shift in online writing, potentially replacing these user-defined linguistic affordances with predefined graphical icons. With the ability to access a large number of colorful and expressive emoji pictographs, will users stop employing non-standard orthographies for expressive communication in social media? In this abstract, we address the question of whether the individual users’ adoption of emojis reduces the frequency of emoticons used in their tweets. From a sample of mostly English tweets, we extracted authors who were early adopters of emojis, and consider them as the treatment group. To measure the causal effect of emoji adoption on emoticon usage, we choose another set of authors (control) who were not yet using emojis at the same time as the treatment group, and compare the differences in emoticon usage of these two groups between a period of an year. We matched each author in the treatment group with an author in the control group, based on their emoticon usage rate before the treatment period. If the individuals in the treatment group reduce Copyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Figure 1: Examples of emoji characters used in Twitter (created using http://www.iemoji.com) their emoticon usage more than the individuals in the control group, this would suggest that emojis are competing with emoticons, and may eventually reduce the amount of nonstandard orthography in social media. Emojis are “picture characters” that originated for mobile phones in Japan in the late 1990s, but recently became popular worldwide in text messaging and social media with the adoption of smartphones supporting input and rendering of emoji characters. In contrast to emoticons, which are created from ASCII character sequences, emojis are represented by unicode characters, and are continuously increasing in number with the introduction of new characters in each new unicode version.1 Emoji characters include not only faces, but also concepts and ideas such as weather, vehicles and buildings, food and drink, or activities such as running and dancing (Figure 1, and example tweets in Figure 2). Emoji Tracker reports real time emoji use on Twitter.2 In computer mediated communication (CMC), emoticons are interpreted as “emotion icons”, primarily as a way to represent facial expressions, such as smile, in the absence of non-verbal cues (Walther and D’Addario 2001). However, later research has shown that emoticons are not just representation of affective stances; they play many other roles in written communication such as showing author intention, sociocultural differences, and author identity (Derks, Bos, and Von Grumbkow 2007; Schnoebelen 2012; Park et al. 2013). In particular, Dresner and Herring (2010) situate the usage of emoticons in CMC between the extremes of nonlanguage and language. We hypothesize that individuals who adopt emojis tend to use fewer emoticons, indicating that emojis are replacing this particular form of orthographic paralinguistic commuhttp://www.unicode.org/reports/tr51/ index.html#Selection_Factors http://www.emojitracker.com/ ar X iv :1 51 0. 08 48 0v 1 [ cs .C L ] 2 8 O ct 2 01 5 Figure 2: Examples tweets using emoji characters nication. We use a matching approach to causal inference to test our hypothesis using observational data from Twitter. Next, we describe the dataset, our study design and report results. Then we first briefly discuss related work and conclude with discussion and future work. Dataset We gathered a corpus of tweets from February 2014 to August 2015, using Twitter’s streaming API. We removed retweets (repetitions of previously posted messages) by excluding messages which contain the “retweeted status” metadata or the “RT” token. We included only authors who have written at least five tweets on average each month and removed authors who have written more than 10% of their tweets in any language other than English. Extracting Emoji and Emoticon Tokens To extract emoji characters from tweets, we converted the messages into unicode representation and used regular expressions to extract unicode characters in the ranges of the “Emoji & Pictographs” category of unicode symbols (other categories include non-Roman characters such as different numbering systems and mathematical symbols). Using this method we identified 1,235 unique emoji characters in a random sample of tweets spanning a period of more than an year (February 2014 to August 2015). Figure 3a shows the percentage of emoji character tokens (i.e., # of emoji tokens # of total tokens ×100%) over time in our our sample of mostly English tweets. 3 As there is no comprehensive list of Twitter emoticons (and new emoticons get introduced over time), we used a data-driven approach to identify emoticons. We constructed regular expressions (e.g., two or more characters with at least one non-alpha numeric character, not containing money/percent/time symbols, etc.) to retrieve an initial set of emoticon-like tokens, and then manually annotated all the items that made up 95% cumulative frequency of emoticon-like tokens, looking at their usage on random examples of tweets. After removing tokens that are not used as emoticons, there were 44 and 52 unique emoticons extracted from tweets of March 2014 and March 2015, respectively. In both cases, the twenty most frequent emoticons made up 90% of all emoticon tokens. Figure 3b shows the percentage of emoticon symbols (i.e., # of emoticon tokens # of total tokens ×100%) over Note that although there is a decreasing trend in emoji usage rate after a peak in June-August 2014 in this sample, emoji usage rate shows an upward trend in a sample of unfiltered tweets, indicating an increasing popularity of emojis on Twitter. (a)
منابع مشابه
Sentiment of Emojis
There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds...
متن کاملContrastive Learning of Emoji-based Representations for Resource-Poor Languages
The introduction of emojis (or emoticons) in social media platforms has given the users an increased potential for expression. We propose a novel method called Classification of Emojis using Siamese Network Architecture (CESNA) to learn emoji-based representations of resource-poor languages by jointly training them with resource-rich languages using a siamese network. CESNA model consists of tw...
متن کاملMojiTalk: Generating Emotional Responses at Scale
Generating emotional language is a key step towards building empathetic natural language processing agents. However, a major challenge for this line of research is the lack of large-scale labeled training data, and previous studies are limited to only small sets of human annotated sentiment labels. Additionally, explicitly controlling the emotion and sentiment of generated text is also difficul...
متن کاملWhat does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis
Emojis allow us to describe objects, situations and even feelings with small images, providing a visual and quick way to communicate. In this paper, we analyse emojis used in Twitter with distributional semantic models. We retrieve 10 millions tweets posted by USA users, and we build several skip gram word embedding models by mapping in the same vectorial space both words and emojis. We test ou...
متن کاملEmotion Analysis of Twitter Data That Use Emoticons and Emoji Ideograms
Twitter is an online social networking service on which users worldwide publish their opinions on a variety of topics, discuss current issues, complain, and express many kinds of emotions. Therefore, Twitter is a rich source of data for opinion mining, sentiment and emotion analysis. This paper focuses on this issue by analysing symbols called emotion tokens, including emotion symbols (e.g. emo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1510.08480 شماره
صفحات -
تاریخ انتشار 2015